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Abstract 


This document is a product of the Path Aware Networking Research Group (PANRG). At the first 
meeting of the PANRG, the Research Group agreed to catalog and analyze past efforts to develop 
and deploy Path Aware techniques, most of which were unsuccessful or at most partially 
successful, in order to extract insights and lessons for Path Aware networking researchers. 


This document contains that catalog and analysis. 


Status of This Memo 


This document is not an Internet Standards Track specification; it is published for informational 
purposes. 


This document is a product of the Internet Research Task Force (IRTF). The IRTF publishes the 
results of Internet-related research and development activities. These results might not be 
suitable for deployment. This RFC represents the consensus of the Path Aware Networking 
Research Group of the Internet Research Task Force (IRTF). Documents approved for publication 
by the IRSG are not candidates for any level of Internet Standard; see Section 2 of RFC 7841. 


Information about the current status of this document, any errata, and how to provide feedback 


on it may be obtained at https://www.rfc-editor.org/info/rfc9049. 


Copyright Notice 


Copyright (c) 2021 IETF Trust and the persons identified as the document authors. All rights 
reserved. 
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This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF 
Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this 
document. Please review these documents carefully, as they describe your rights and restrictions 
with respect to this document. 
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1. Introduction 


This document describes the lessons that IETF participants have learned (and learned the hard 
way) about Path Aware networking over a period of several decades. It also provides an analysis 
of reasons why various Path Aware techniques have seen limited or no deployment. 


This document represents the consensus of the Path Aware Networking Research Group 
(PANRG). 


1.1. What Do "Path" and "Path Awareness" Mean in This Document? 


One of the first questions reviewers of this document have asked is "What's the definition of a 
Path, and what's the definition of Path Awareness?" That is not an easy question to answer for 
this document. 


These terms have definitions in other PANRG documents [PANRG] and are still the subject of 
some discussion in the Research Group, as of the date of this document. But because this 
document reflects work performed over several decades, the technologies described in Section 6 
significantly predate the current definitions of "Path" and "Path Aware" in use in the Path Aware 
Networking Research Group, and it is unlikely that all the contributors to Section 6 would have 
had the same understanding of these terms. Those technologies were considered "Path Aware" in 
early PANRG discussions and so are included in this retrospective document. 


It is worth noting that the definitions of "Path" and "Path Aware" in [PANRG-PATH-PROPERTIES] 
would apply to Path Aware techniques at a number of levels of the Internet protocol architecture 
({RFC1122], plus several decades of refinements), but the contributions received for this 
document tended to target the transport layer and to treat a "Path" constructed by routers as 
opaque. It would be useful to consider how applicable the Lessons Learned cataloged in this 
document are, at other layers, and that would be a fine topic for follow-on research. 


The current definition of "Path" in the Path Aware Networking Research Group appears in 
Section 2 ("Terminology") in [PANRG-PATH-PROPERTIES]. That definition is included here as a 
convenience to the reader. 


Path: A sequence of adjacent path elements over which a packet can be transmitted, 
starting and ending with a node. A path is unidirectional. Paths are time-dependent, i.e., 
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the sequence of path elements over which packets are sent from one node to another 
may change. A path is defined between two nodes. For multicast or broadcast, a packet 
may be sent by one node and received by multiple nodes. In this case, the packet is sent 
over multiple paths at once, one path for each combination of sending and receiving 
node; these paths do not have to be disjoint. Note that an entity may have only partial 
visibility of the path elements that comprise a path and visibility may change over time. 
Different entities may have different visibility of a path and/or treat path elements at 
different levels of abstraction. 


The current definition of Path Awareness, used by the Path Aware Networking Research Group, 
appears in Section 1.1 ("Definition") in [PANRG-QUESTIONS]. That definition is included here as a 
convenience to the reader. 


For purposes of this document, "path aware networking" describes endpoint discovery 
of the properties of paths they use for communication across an internetwork, and 
endpoint reaction to these properties that affects routing and/or data transfer. Note that 
this can and already does happen to some extent in the current Internet architecture; 
this definition expands current techniques of path discovery and manipulation to cross 
administrative domain boundaries and up to the transport and application layers at the 
endpoints. 


Expanding on this definition, a "path aware internetwork" is one in which endpoint 
discovery of path properties and endpoint selection of paths used by traffic exchanged 
by the endpoint are explicitly supported, regardless of the specific design of the protocol 
features which enable this discovery and selection. 


2. A Perspective on This Document 


At the first meeting of the Path Aware Networking Research Group [PANRG], at IETF 99 
[PANRG-99], Olivier Bonaventure led a discussion of "A Decade of Path Awareness" [PATH- 
Decade], on attempts, which were mostly unsuccessful for a variety of reasons, to exploit Path 
Aware techniques and achieve a variety of goals over the past decade. At the end of that 
discussion, two things were abundantly clear. 


° The Internet community has accumulated considerable experience with many Path Aware 
techniques over a long period of time, and 

e Although some Path Aware techniques have been deployed (for example, Differentiated 
Services, or Diffserv [RFC2475]), most of these techniques haven't seen widespread adoption 
and deployment. Even "successful" techniques like Diffserv can face obstacles that prevent 
wider usage. The reasons for non-adoption and limited adoption and deployment are many 
and are worthy of study. 
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The meta-lessons from that experience were as follows: 


e Path Aware networking has been more Research than Engineering, so establishing an IRTF 
Research Group for Path Aware networking was the right thing to do [RFC7418]. 


e Analyzing a catalog of past experience to learn the reasons for non-adoption would be a 
great first step for the Research Group. 


Allison Mankin, as IRTF Chair, officially chartered the Path Aware Networking Research Group in 
July 2018. 


This document contains the analysis performed by that Research Group (Section 4), based on that 
catalog (Section 6). 


2.1. Notes for the Reader 


This Informational document discusses Path Aware protocol mechanisms considered, and in 
some cases standardized, by the Internet Engineering Task Force (IETF), and it considers Lessons 
Learned from those mechanisms. The intention is to inform the work of protocol designers, 
whether in the IRTF, the IETF, or elsewhere in the Internet ecosystem. 


As an Informational document published in the IRTF Stream, this document has no authority 
beyond the quality of the analysis it contains. 


2.2. A Note about Path Aware Techniques Included in This Document 


This document does not catalog every proposed Path Aware technique that was not adopted and 
deployed. Instead, we limited our focus to technologies that passed through the IETF community 
and still identified enough techniques to provide background for the lessons included in Section 
4 to inform researchers and protocol engineers in their work. 


No shame is intended for the techniques included in this document. As shown in Section 4, the 
quality of specific techniques had little to do with whether they were deployed or not. Based on 
the techniques cataloged in this document, it is likely that when these techniques were put 
forward, the proponents were trying to engineer something that could not be engineered 
without first carrying out research. Actual shame would be failing to learn from experience and 
failing to share that experience with other networking researchers and engineers. 


2.3. Architectural Guidance 


As background for understanding the Lessons Learned contained in this document, the reader is 
encouraged to become familiar with the Internet Architecture Board's documents on "What 
Makes for a Successful Protocol?" [RFC5218] and "Planning for Protocol Adoption and 
Subsequent Transitions" [RFC8170]. 


Although these two documents do not specifically target Path Aware networking protocols, they 
are helpful resources for readers seeking to improve their understanding of considerations for 
successful adoption and deployment of any protocol. For example, the basic success factors 
described in Section 2.1 of [RFC5218] are helpful for readers of this document. 
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Because there is an economic aspect to decisions about deployment, the IAB Workshop on 
Internet Technology Adoption and Transition [ITAT] report [RFC7305] also provides food for 
thought. 


Several of the Lessons Learned in Section 4 reflect considerations described in [RFC5218], 
[RFC7305], and [RFC8170]. 


2.4. Terminology Used in This Document 


The terms "node" and "element" in this document have the meaning defined in [PANRG-PATH- 
PROPERTIES]. 


2.5. Methodology for Contributions 


This document grew out of contributions by various IETF participants with experience with one 
or more Path Aware techniques. 


There are many things that could be said about the Path Aware techniques that have been 
developed. For the purposes of this document, contributors were requested to provide 


e the name of a technique, including an abbreviation if one was used. 

e if available, a long-term pointer to the best reference describing the technique. 
e a short description of the problem the technique was intended to solve. 

e a short description of the reasons why the technique wasn't adopted. 


e a short statement of the lessons that researchers can learn from our experience with this 
technique. 


3. Applying the Lessons We've Learned 


The initial scope for this document was roughly "What mistakes have we made in the decade 
prior to [PANRG-99], that we shouldn't make again?" Some of the contributions in Section 6 
predate the initial scope. The earliest Path Aware technique referred to in Section 6 is [[EN-119], 
which was published in the late 1970s; see Section 6.1. Given that the networking ecosystem has 
evolved continuously, it seems reasonable to consider how to apply these lessons. 


The PANRG reviewed the Lessons Learned (Section 4) contained in the May 23, 2019 draft version 
of this document at IETF 105 [PANRG-105-Min] and carried out additional discussion at IETF 106 
[PANRG-106-Min]. Table 1 provides the "sense of the room" about each lesson after those 
discussions. The intention was to capture whether a specific lesson seems to be 


e "Invariant" - well-understood and is likely to be applicable for any proposed Path Aware 
networking solution. 


e "Variable" - has impeded deployment in the past but might not be applicable in a specific 
technique. Engineering analysis to understand whether the lesson is applicable is prudent. 
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e "Not Now" - a characteristic that tends to turn up a minefield full of dragons. Prudent 
network engineers will wish to avoid gambling on a technique that relies on this, until 
something significant changes. 


Section 6.9 on Explicit Congestion Notification (ECN) was added during the review and approval 
process, based on a question from Martin Duke. Section 6.9, as contained in the March 8, 2021 
draft version of this document, was discussed at [PANRG-110] and is summarized in Section 4.13, 
describing a new Lesson Learned. 


Lesson Category 
Justifying Deployment (Section 4.1) Invariant 
Providing Benefits for Early Adopters (Section 4.2) Invariant 
Providing Benefits during Partial Deployment (Section 4.3) Invariant 


Outperforming End-to-End Protocol Mechanisms (Section 4.4) Variable 


Paying for Path Aware Techniques (Section 4.5) Invariant 
Impact on Operational Practices (Section 4.6) Invariant 
Per-Connection State (Section 4.7) Variable 
Keeping Traffic on Fast Paths (Section 4.8) Variable 
Endpoints Trusting Intermediate Nodes (Section 4.9) Not Now 
Intermediate Nodes Trusting Endpoints (Section 4.10) Not Now 
Reacting to Distant Signals (Section 4.11) Variable 
Support in Endpoint Protocol Stacks (Section 4.12) Variable 
Planning for Failure (Section 4.13) Invariant 
Table 1 


"Justifying Deployment", "Providing Benefits for Early Adopters", "Paying for Path Aware 
Techniques", "Impact on Operational Practices", and "Planning for Failure" were considered to be 
Invariant -- the sense of the room was that these would always be considerations for any 
proposed Path Aware technique. 


"Providing Benefits during Partial Deployment" was added after IETF 105, during Research 
Group Last Call, and is also considered to be Invariant. 
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For "Outperforming End-to-End Protocol Mechanisms", there is a trade-off between improved 
performance from Path Aware techniques and additional complexity required by some Path 
Aware techniques. 


e For example, if you can obtain the same understanding of path characteristics from 
measurements obtained over a few more round trips, endpoint implementers are unlikely to 
be eager to add complexity, and many attributes can be measured from an endpoint, without 
assistance from intermediate nodes. 


For "Per-Connection State", the key questions discussed in the Research Group were "how much 
state" and "where state is maintained". 


e Integrated Services (IntServ) (Section 6.2) required state at every participating intermediate 
node for every connection between two endpoints. As the Internet ecosystem has evolved, 
carrying many connections in a tunnel that appears to intermediate nodes as a single 
connection has become more common, so that additional end-to-end connections don't add 
additional state to intermediate nodes between tunnel endpoints. If these tunnels are 
encrypted, intermediate nodes between tunnel endpoints can't distinguish between 
connections, even if that were desirable. 


For "Keeping Traffic on Fast Paths", we noted that this was true for many platforms, but not for 
all. 


e For backbone routers, this is likely an Invariant, but for platforms that rely more on general- 
purpose computers to make forwarding decisions, this may not be a fatal flaw for Path 
Aware techniques. 


For "Endpoints Trusting Intermediate Nodes" and "Intermediate Nodes Trusting Endpoints", 
these lessons point to the broader need to revisit the Internet Threat Model. 


e We noted with relief that discussions about this were already underway in the IETF 
community at IETF 105 (see the Security Area Open Meeting minutes [SAAG-105-Min] for 
discussion of [INTERNET-THREAT-MODEL] and [FARRELL-ETM]), and the Internet 
Architecture Board has created a mailing list for continued discussions [model-t], but we 
recognize that there are Path Aware networking aspects of this effort, requiring research. 


For "Reacting to Distant Signals", we noted that not all attributes are equal. 


e If an attribute is stable over an extended period of time, is difficult to observe via end-to-end 
mechanisms, and is valuable, Path Aware techniques that rely on that attribute to provide a 
significant benefit become more attractive. 

e Analysis to help identify attributes that are useful enough to justify deployment of Path 
Aware techniques that make use of those attributes would be helpful. 
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For "Support in Endpoint Protocol Stacks", we noted that Path Aware applications must be able to 
identify and communicate requirements about path characteristics. 


e The de facto sockets API has no way of signaling application expectations for the network 
path to the protocol stack. 


4. Summary of Lessons Learned 


This section summarizes the Lessons Learned from the contributed subsections in Section 6. 


Each Lesson Learned is tagged with one or more contributions that encountered this obstacle as 
a significant impediment to deployment. Other contributed techniques may have also 
encountered this obstacle, but this obstacle may not have been the biggest impediment to 
deployment for those techniques. 


It is useful to notice that sometimes an obstacle might impede deployment, while at other times, 
the same obstacle might prevent adoption and deployment entirely. The Research Group 
discussed distinguishing between obstacles that impede and obstacles that prevent, but it 
appears that the boundary between "impede" and "prevent" can shift over time -- some of the 
Lessons Learned are based on both a) Path Aware techniques that were not deployed and b) Path 
Aware techniques that were deployed but were not deployed widely or quickly. See Sections 6.6 
and 6.6.3 for examples of this shifting boundary. 


4.1. Justifying Deployment 


The benefit of Path Awareness must be great enough to justify making changes in an operational 
network. The colloquial U.S. American English expression, "If it ain't broke, don't fix it" is a "best 
current practice" on today's Internet. (See Sections 6.3, 6.4, 6.5, and 6.9, in addition to [RFC5218].) 


4.2. Providing Benefits for Early Adopters 


Providing benefits for early adopters can be key -- if everyone must deploy a technique in order 
for the technique to provide benefits, or even to work at all, the technique is unlikely to be 
adopted widely or quickly. (See Sections 6.2 and 6.3, in addition to [RFC5218].) 


4.3. Providing Benefits during Partial Deployment 


Some proposals require that all path elements along the full length of the path must be upgraded 
to support a new technique, before any benefits can be seen. This is likely to require 
coordination between operators who control a subset of path elements, and between operators 
and end users if endpoint upgrades are required. If a technique provides benefits when only a 
part of the path has been upgraded, this is likely to encourage adoption and deployment. (See 
Sections 6.2, 6.3, and 6.9, in addition to [RFC5218].) 
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4.4. Outperforming End-to-End Protocol Mechanisms 


Adaptive end-to-end protocol mechanisms may respond to feedback quickly enough that the 
additional realizable benefit from a new Path Aware mechanism that tries to manipulate nodes 
along a path, or observe the attributes of nodes along a path, may be much smaller than 
anticipated. (See Sections 6.3 and 6.5.) 


4.5. Paying for Path Aware Techniques 


"Follow the money." If operators can't charge for a Path Aware technique to recover the costs of 
deploying it, the benefits to the operator must be really significant. Corollary: if operators charge 
for a Path Aware technique, the benefits to users of that Path Aware technique must be 
significant enough to justify the cost. (See Sections 6.1, 6.2, 6.5, and 6.9.) 


4.6. Impact on Operational Practices 


The impact of a Path Aware technique requiring changes to operational practices can affect how 
quickly or widely a promising technique is deployed. The impacts of these changes may make 
deployment more likely, but they often discourage deployment. (See Section 6.6, including 
Section 6.6.3.) 


4.7. Per-Connection State 


Per-connection state in intermediate nodes has been an impediment to adoption and deployment 
in the past, because of added cost and complexity. Often, similar benefits can be achieved with 
much less finely grained state. This is especially true as we move from the edge of the network, 
further into the routing core. (See Sections 6.1 and 6.2.) 


4.8. Keeping Traffic on Fast Paths 


Many modern platforms, especially high-end routers, have been designed with hardware that 
can make simple per-packet forwarding decisions ("fast paths") but have not been designed to 
make heavy use of in-band mechanisms such as IPv4 and IPv6 Router Alert Options (RAOs) that 
require more processing to make forwarding decisions. Packets carrying in-band mechanisms 
are diverted to other processors in the router with much lower packet-processing rates. 
Operators can be reluctant to deploy techniques that rely heavily on in-band mechanisms 
because they may significantly reduce packet throughput. (See Section 6.7.) 


4.9. Endpoints Trusting Intermediate Nodes 


If intermediate nodes along the path can't be trusted, it's unlikely that endpoints will rely on 
signals from intermediate nodes to drive changes to endpoint behaviors. We note that "trust" is 
not binary -- one low level of trust applies when a node receiving a message can confirm that the 
sender of the message has visibility of the packets on the path it is seeking to control [RFC8085] 
(e.g., an ICMP Destination Unreachable message [RFC0792] that includes the Internet Header + 64 
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bits of Original Data Datagram payload from the source). A higher level of trust can arise when 
an endpoint has established a short-term, or even long-term, trust relationship with network 
nodes. (See Sections 6.4 and 6.5.) 


4.10. Intermediate Nodes Trusting Endpoints 


If the endpoints do not have any trust relationship with the intermediate nodes along a path, 
operators have been reluctant to deploy techniques that rely on endpoints sending 
unauthenticated control signals to routers. (See Sections 6.2 and 6.7.) (We also note that this still 
remains a factor hindering deployment of Diffserv.) 


4.11. Reacting to Distant Signals 


Because the Internet is a distributed system, if the distance that information from distant path 
elements travels to a Path Aware host is sufficiently large, the information may no longer 
accurately represent the state and situation at the distant host or elements along the path when it 
is received locally. In this case, the benefit that a Path Aware technique provides will be 
inconsistent and may not always be beneficial. (See Section 6.3.) 


4.12. Support in Endpoint Protocol Stacks 


Just because a protocol stack provides a new feature/signal does not mean that applications will 
use the feature/signal. Protocol stacks may not know how to effectively utilize Path Aware 
techniques, because the protocol stack may require information from applications to permit the 
technique to work effectively, but applications may not a priori know that information. Even if 
the application does know that information, the de facto sockets API has no way of signaling 
application expectations for the network path to the protocol stack. In order for applications to 
provide these expectations to protocol stacks, we need an API that signals more than the packets 
to be sent. (See Sections 6.1 and 6.2.) 


4.13. Planning for Failure 


If early implementers discover severe problems with a new feature, that feature is likely to be 
disabled, and convincing implementers to re-enable that feature can be very difficult and can 
require years or decades. In addition to testing, partial deployment for a subset of users, 
implementing instrumentation that will detect degraded user experience, and even "failback" to 
a previous version or "failover" to an entirely different implementation are likely to be helpful. 
(See Section 6.9.) 


5. Future Work 


By its nature, this document has been retrospective. In addition to considering how the Lessons 
Learned to date apply to current and future Path Aware networking proposals, it's also worth 
considering whether there is deeper investigation left to do. 


e We note that this work was based on contributions from experts on various Path Aware 
techniques, and all of the contributed techniques involved unicast protocols. We didn't 
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consider how these lessons might apply to multicast, and, given anecdotal reports at the IETF 
109 Media Operations (MOPS) Working Group meeting of IP multicast offerings within data 
centers at one or more cloud providers [MOPS-109-Min], it might be useful to think about 
Path Awareness in multicast, before we have a history of unsuccessful deployments to 
document. 


° The question of whether a mechanism supports admission control, based on either 
endpoints or applications, is associated with Path Awareness. One of the motivations of 
IntServ and a number of other architectures (e.g., Deterministic Networking [RFC8655]) is 
the ability to "say no" to an application based on resource availability on a path, before the 
application tries to inject traffic onto that path and discovers the path does not have the 
capacity to sustain enough utility to meet the application's minimum needs. The question of 
whether admission control is needed comes up repeatedly, but we have learned a few useful 
lessons that, while covered implicitly in some of the Lessons Learned provided in this 
document, might be explained explicitly: 

o We have gained a lot of experience with application-based adaptation since the days 
where applications just injected traffic inelastically into the network. Such adaptations 
seem to work well enough that admission control is of less value to these applications. 


o There are end-to-end measurement techniques that can steer traffic at the application 
layer (Content Delivery Networks (CDNs), multi-CDNs like Conviva [Conviva], etc.). 


o We noted in Section 4.12 that applications often don't know how to utilize Path Aware 
techniques. This includes not knowing enough about their admission control threshold to 
be able to ask accurately for the resources they need, whether this is because the 
application itself doesn't know or because the application has no way to signal its 
expectations to the underlying protocol stack. To date, attempts to help them haven't 
gotten anywhere (e.g., the multiple-TSPEC (Traffic Specification) additions to RSVP to 
attempt to mirror codec selection by applications [INTSERV-MULTIPLE-TSPEC] expired in 
2013). 


e We note that this work took the then-current IP network architecture as given, at least at the 
time each technique was proposed. It might be useful to consider aspects of the now-current 
IP network architecture that ease, or impede, Path Aware techniques. For example, there is 
limited ability in IP to constrain bidirectional paths to be symmetric, and information-centric 
networking protocols such as Named Data Networking (NDN) and Content-Centric 
Networking (CCNx) [RFC8793] must force bidirectional path symmetry using protocol- 
specific mechanisms. 


6. Contributions 


Contributions on these Path Aware techniques were analyzed to arrive at the Lessons Learned 
captured in Section 4. 


Our expectation is that most readers will not need to read through this section carefully, but we 
wanted to record these hard-fought lessons as a service to others who may revisit this document, 
so they'll have the details close at hand. 
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6.1. Stream Transport (ST, ST2, ST2+) 


The suggested references for Stream Transport are: 


° "ST - A Proposed Internet Stream Protocol" [IEN-119] 
° "Experimental Internet Stream Protocol: Version 2 (ST-II)" [RFC1190] 
e "Internet Stream Protocol Version 2 (ST2) Protocol Specification - Version ST2+" [RFC1819] 


The first version of Stream Transport, ST [IEN-119], was published in the late 1970s and was 
implemented and deployed on the ARPANET at small scale. It was used throughout the 1980s for 
experimental transmission of voice, video, and distributed simulation. 


The second version of the ST specification (ST2) [RFC1190] [RFC1819] was an experimental 
connection-oriented internetworking protocol that operated at the same layer as connectionless 
IP. ST2 packets could be distinguished by their IP header version numbers (IP, at that time, used 
version number 4, while ST2 used version number 5). 


ST2 used a control plane layered over IP to select routes and reserve capacity for real-time 
streams across a network path, based on a flow specification communicated by a separate 
protocol. The flow specification could be associated with QoS state in routers, producing an 
experimental resource reservation protocol. This allowed ST2 routers along a path to offer end- 
to-end guarantees, primarily to satisfy the QoS requirements for real-time services over the 
Internet. 


6.1.1. Reasons for Non-deployment 


Although implemented in a range of equipment, ST2 was not widely used after completion of the 
experiments. It did not offer the scalability and fate-sharing properties that have come to be 
desired by the Internet community. 


The ST2 protocol is no longer in use. 


6.1.2. Lessons Learned 


As time passed, the trade-off between router processing and link capacity changed. Links became 
faster, and the cost of router processing became comparatively more expensive. 


The ST2 control protocol used "hard state" -- once a route was established, and resources were 
reserved, routes and resources existed until they were explicitly released via signaling. A soft- 
state approach was thought superior to this hard-state approach and led to development of the 
IntServ model described in Section 6.2. 


6.2. Integrated Services (IntServ) 


The suggested references for IntServ are: 


e "Integrated Services in the Internet Architecture: an Overview" [RFC1633] 
e "Specification of the Controlled-Load Network Element Service" [RFC2211] 


Dawkins Informational Page 14 


RFC 9049 What Not to Do June 2021 


e "Specification of Guaranteed Quality of Service" [RFC2212] 
* "General Characterization Parameters for Integrated Service Network Elements" [RFC2215] 
e "Resource ReSerVation Protocol (RSVP) -- Version 1 Functional Specification" [RFC2205] 


In 1994, when the IntServ architecture document [RFC1633] was published, real-time traffic was 
first appearing on the Internet. At that time, bandwidth was still a scarce commodity. Internet 
Service Providers built networks over DS3 (45 Mbps) infrastructure, and sub-rate (< 1 Mbps) 
access was common. Therefore, the IETF anticipated a need for a fine-grained QoS mechanism. 


In the IntServ architecture, some applications can require service guarantees. Therefore, those 
applications use RSVP [RFC2205] to signal QoS reservations across network paths. Every router in 
the network that participates in IntServ maintains per-flow soft state to a) perform call 
admission control and b) deliver guaranteed service. 


Applications use Flow Specifications (Flow Specs, or FLOWSPECs) [RFC2210] to describe the 
traffic that they emit. RSVP reserves capacity for traffic on a per-Flow-Spec basis. 


6.2.1. Reasons for Non-deployment 


Although IntServ has been used in enterprise and government networks, IntServ was never 
widely deployed on the Internet because of its cost. The following factors contributed to 
operational cost: 


e IntServ must be deployed on every router that is on a path where IntServ is to be used. 
Although it is possible to include a router that does not participate in IntServ along the path 
being controlled, if that router is likely to become a bottleneck, IntServ cannot be used to 
avoid that bottleneck along the path. 


e IntServ maintained per-flow state. 
As IntServ was being discussed, the following occurred: 


e For many expected uses, it became more cost effective to solve the QoS problem by adding 
bandwidth. Between 1994 and 2000, Internet Service Providers upgraded their 
infrastructures from DS3 (45 Mbps) to OC-48 (2.4 Gbps). This meant that even if an endpoint 
was using IntServ in an IntServ-enabled network, its requests would rarely, if ever, be 
denied, so endpoints and Internet Service Providers had little reason to enable IntServ. 


e Diffserv [RFC2475] offered a more cost-effective, albeit less fine-grained, solution to the QoS 
problem. 


6.2.2. Lessons Learned 
The following lessons were learned: 
e Any mechanism that requires every participating on-path router to maintain per-flow state 


is not likely to succeed, unless the additional cost for offering the feature can be recovered 
from the user. 


e Any mechanism that requires an operator to upgrade all of its routers is not likely to 
succeed, unless the additional cost for offering the feature can be recovered from the user. 
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In environments where IntServ has been deployed, trust relationships with endpoints are very 
different from trust relationships on the Internet itself. There are often clearly defined 
hierarchies in Service Level Agreements (SLAs) governing well-defined transport flows operating 
with predetermined capacity and latency requirements over paths where capacity or other 
attributes are constrained. 


IntServ was never widely deployed to manage capacity across the Internet. However, the 
technique that it produced was deployed for reasons other than bandwidth management. RSVP is 
widely deployed as an MPLS signaling mechanism. BGP reuses the RSVP concept of Filter Specs to 
distribute firewall filters, although they are called "Flow Spec Component Types" in BGP 
[RFC5575]. 


6.3. Quick-Start TCP 


The suggested references for Quick-Start TCP are: 


e "Quick-Start for TCP and IP" [RFC4782] 
e "Determining an appropriate sending rate over an underutilized network path" [SAF07] 
e "Fast Startup Internet Congestion Control for Broadband Interactive Applications" [Sch11] 


e "Using Quick-Start to enhance TCP-friendly rate control performance in bidirectional satellite 
networks" [QS-SAT] 


Quick-Start is defined in an Experimental RFC [RFC4782] and is a TCP extension that leverages 
support from the routers on the path to determine an allowed initial sending rate for a path 
through the Internet, either at the start of data transfers or after idle periods. Without 
information about the path, a sender cannot easily determine an appropriate initial sending rate. 
The default TCP congestion control therefore uses the safe but time-consuming slow-start 
algorithm [RFC5681]. With Quick-Start, connections are allowed to use higher initial sending 
rates if there is significant unused bandwidth along the path and if the sender and all of the 
routers along the path approve the request. 


By examining the Time To Live (TTL) field in Quick-Start packets, a sender can determine if 
routers on the path have approved the Quick-Start request. However, this method is unable to 
take into account the routers hidden by tunnels or other network nodes invisible at the IP layer. 


The protocol also includes a nonce that provides protection against cheating routers and 
receivers. If the Quick-Start request is explicitly approved by all routers along the path, the TCP 
host can send at up to the approved rate; otherwise, TCP would use the default congestion 
control. Quick-Start requires modifications in the involved end systems as well as in routers. Due 
to the resulting deployment challenges, Quick-Start was only proposed in [RFC4782] for 
controlled environments. 


The Quick-Start mechanism is a lightweight, coarse-grained, in-band, network-assisted fast 
startup mechanism. The benefits are studied by simulation in a research paper [SAF07] that 
complements the protocol specification. The study confirms that Quick-Start can significantly 
speed up mid-sized data transfers. That paper also presents router algorithms that do not require 
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keeping per-flow state. Later studies [Sch11] comprehensively analyze Quick-Start with a full 
Linux implementation and with a router fast-path prototype using a network processor. In both 
cases, Quick-Start could be implemented with limited additional complexity. 


6.3.1. Reasons for Non-deployment 


However, experiments with Quick-Start in [Sch11] revealed several challenges: 


e Having information from the routers along the path can reduce the risk of congestion but 
cannot avoid it entirely. Determining whether there is unused capacity is not trivial in actual 
router and host implementations. Data about available capacity visible at the IP layer may be 
imprecise, and due to the propagation delay, information can already be outdated when it 
reaches a sender. There is a trade-off between the speedup of data transfers and the risk of 
congestion even with Quick-Start. This could be mitigated by only allowing Quick-Start to 
access a proportion of the unused capacity along a path. 


e For scalable router fast-path implementations, it is important to enable parallel processing of 
packets, as this is a widely used method, e.g., in network processors. One challenge is 
synchronization of information between packets that are processed in parallel, which should 
be avoided as much as possible. 


e Only some types of application traffic can benefit from Quick-Start. Capacity needs to be 
requested and discovered. The discovered capacity needs to be utilized by the flow, or it 
implicitly becomes available for other flows. Failing to use the requested capacity may have 
already reduced the pool of Quick-Start capacity that was made available to other competing 
Quick-Start requests. The benefit is greatest when senders use this only for bulk flows and 
avoid sending unnecessary Quick-Start requests, e.g., for flows that only send a small amount 
of data. Choosing an appropriate request size requires application-internal knowledge that is 
not commonly expressed by the transport API. How a sender can determine the rate for an 
initial Quick-Start request is still a largely unsolved problem. 


There is no known deployment of Quick-Start for TCP or other IETF transports. 


6.3.2. Lessons Learned 


Some lessons can be learned from Quick-Start. Despite being a very lightweight protocol, Quick- 
Start suffers from poor incremental deployment properties regarding both a) the required 
modifications in network infrastructure and b) its interactions with applications. Except for 
corner cases, congestion control can be quite efficiently performed end to end in the Internet, 
and in modern stacks there is not much room for significant improvement by additional network 
support. 


After publication of the Quick-Start specification, there have been large-scale experiments with 
an initial window of up to 10 segments [RFC6928]. This alternative "[W10" approach can also 
ramp up data transfers faster than the standard congestion control, but it only requires sender- 
side modifications. As a result, this approach can be easier and incrementally deployed in the 
Internet. While theoretically Quick-Start can outperform "IW10", the improvement in completion 
time for data transfer times can, in many cases, be small. After publication of [RFC6928], most 
modern TCP stacks have increased their default initial window. 
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6.4. ICMP Source Quench 


The suggested reference for ICMP Source Quench is: 
* "Internet Control Message Protocol" [RFC0792] 


The ICMP Source Quench message [RFC0792] allowed an on-path router to request the source of 
a flow to reduce its sending rate. This method allowed a router to provide an early indication of 
impending congestion on a path to the sources that contribute to that congestion. 


6.4.1. Reasons for Non-deployment 


This method was deployed in Internet routers over a period of time; the reaction of endpoints to 
receiving this signal has varied. For low-speed links, with low multiplexing of flows the method 
could be used to regulate (momentarily reduce) the transmission rate. However, the simple 
signal does not scale with link speed or with the number of flows sharing a link. 


The approach was overtaken by the evolution of congestion control methods in TCP [RFC2001], 
and later also by other IETF transports. Because these methods were based upon measurement 
of the end-to-end path and an algorithm in the endpoint, they were able to evolve and mature 
more rapidly than methods relying on interactions between operational routers and endpoint 
stacks. 


After ICMP Source Quench was specified, the IETF began to recommend that transports provide 
end-to-end congestion control [RFC2001]. The Source Quench method has been obsoleted by the 
IETF [RFC6633], and both hosts and routers must now silently discard this message. 


6.4.2. Lessons Learned 


This method had several problems. 


First, [RFC0792] did not sufficiently specify how the sender would react to the ICMP Source 
Quench signal from the path (e.g., [RFC1016]). There was ambiguity in how the sender should 
utilize this additional information. This could lead to unfairness in the way that receivers (or 
routers) responded to this message. 


Second, while the message did provide additional information, the Explicit Congestion 
Notification (ECN) mechanism [RFC3168] provided a more robust and informative signal for 
network nodes to provide early indication that a path has become congested. 


The mechanism originated at a time when the Internet trust model was very different. Most 
endpoint implementations did not attempt to verify that the message originated from an on-path 
node before they utilized the message. This made it vulnerable to Denial-of-Service (DoS) attacks. 
In theory, routers might have chosen to use the quoted packet contained in the ICMP payload to 
validate that the message originated from an on-path node, but this would have increased per- 
packet processing overhead for each router along the path and would have required transport 
functionality in the router to verify whether the quoted packet header corresponded to a packet 
the router had sent. In addition, Section 5.2 of [RFC4443] noted ICMPv6-based attacks on hosts 
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that would also have threatened routers processing ICMPv6 Source Quench payloads. As time 
passed, it became increasingly obvious that the lack of validation of the messages exposed 
receivers to a security vulnerability where the messages could be forged to create a tangible DoS 
opportunity. 


6.5. Triggers for Transport (TRIGTRAN) 
The suggested references for TRIGTRAN are: 


e TRIGTRAN BOF at IETF 55 [TRIGTRAN-55] 
e TRIGTRAN BOF at IETF 56 [TRIGTRAN-56] 


TCP [RFC0793] has a well-known weakness -- the end-to-end flow control mechanism has only a 
single signal, the loss of a segment, detected when no acknowledgment for the lost segment is 
received at the sender. There are multiple reasons why the sender might not have received an 
acknowledgment for the segment. To name several, the segment could have been trapped in a 
routing loop, damaged in transmission and failed checksum verification at the receiver, or lost 
because some intermediate device discarded the packet, or any of a variety of other things could 
have happened to the acknowledgment on the way back from the receiver to the sender. TCP 
implementations since the late 1980s have made the "safe" decision and have interpreted the loss 
of a segment as evidence that the path between two endpoints may have become congested 
enough to exhaust buffers on intermediate hops, so that the TCP sender should "back off" -- 
reduce its sending rate until it knows that its segments are now being delivered without loss 
[RFC5681]. 


The thinking behind TRIGTRAN was that if a path completely stopped working because a link 
along the path was "down", somehow something along the path could signal TCP when that link 
returned to service, and the sending TCP could retry immediately, without waiting for a full 
retransmission timeout (RTO) period. 


6.5.1. Reasons for Non-deployment 


The early dreams for TRIGTRAN were dashed because of an assumption that TRIGTRAN triggers 
would be unauthenticated. This meant that any "safe" TRIGTRAN mechanism would have relied 
on a mechanism such as setting the IPv4 TTL or IPv6 Hop Count to 255 at a sender and testing 
that it was 254 upon receipt, so that a receiver could verify that a signal was generated by an 
adjacent sender known to be on the path being used and not some unknown sender that might 
not even be on the path (e.g., "The Generalized TTL Security Mechanism (GTSM)" [RFC5082)]). This 
situation is very similar to the case for ICMP Source Quench messages as described in Section 6.4, 
which were also unauthenticated and could be sent by an off-path attacker, resulting in 
deprecation of ICMP Source Quench message processing [RFC6633]. 


TRIGTRAN's scope shrunk from "the path is down" to "the first-hop link is down." 


But things got worse. 
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Because TRIGTRAN triggers would only be provided when the first-hop link was "down", 
TRIGTRAN triggers couldn't replace normal TCP retransmission behavior if the path failed 
because some link further along the network path was "down". So TRIGTRAN triggers added 
complexity to an already-complex TCP state machine and did not allow any existing complexity 
to be removed. 


There was also an issue that the TRIGTRAN signal was not sent in response to a specific host that 
had been sending packets and was instead a signal that stimulated a response by any sender on 
the link. This needs to scale when there are multiple flows trying to use the same resource, yet 
the sender of a trigger has no understanding of how many of the potential traffic sources will 
respond by sending packets -- if recipients of the signal "back off" their responses to a trigger to 
improve scaling, then that immediately mitigates the benefit of the signal. 


Finally, intermediate forwarding nodes required modification to provide TRIGTRAN triggers, but 
operators couldn't charge for TRIGTRAN triggers, so there was no way to recover the cost of 
modifying, testing, and deploying updated intermediate nodes. 


Two TRIGTRAN BOFs were held, at IETF 55 [TRIGTRAN-55] and IETF 56 [TRIGTRAN-56], but this 
work was not chartered, and there was no interest in deploying TRIGTRAN unless it was 
chartered and standardized in the IETF. 


6.5.2. Lessons Learned 


The reasons why this work was not chartered, much less deployed, provide several useful lessons 
for researchers. 


e TRIGTRAN started with a plausible value proposition, but networking realities in the early 
2000s forced reductions in scope that led directly to reductions in potential benefits but no 
corresponding reductions in costs and complexity. 


e These reductions in scope were the direct result of an inability for hosts to trust or 
authenticate TRIGTRAN signals they received from the network. 


e Operators did not believe they could charge for TRIGTRAN signaling, because first-hop links 
didn't fail frequently and TRIGTRAN provided no reduction in operating expenses, so there 
was little incentive to purchase and deploy TRIGTRAN-capable network equipment. 


It is also worth noting that the targeted environment for TRIGTRAN in the late 1990s contained 
links with a relatively small number of directly connected hosts -- for instance, cellular or 
satellite links. The transport community was well aware of the dangers of sender 
synchronization based on multiple senders receiving the same stimulus at the same time, but the 
working assumption for TRIGTRAN was that there wouldn't be enough senders for this to be a 
meaningful problem. In the 2010s, it was common for a single "link" to support many senders 
and receivers, likely requiring TRIGTRAN senders to wait some random amount of time before 
sending after receiving a TRIGTRAN signal, which would have reduced the benefits of TRIGTRAN 
even more. 
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6.6. Shim6 


The suggested reference for Shim6 is: 
e "Shim6: Level 3 Multihoming Shim Protocol for IPv6" [RFC5533] 


The IPv6 routing architecture [RFC1887] assumed that most sites on the Internet would be 
identified by Provider Assigned IPv6 prefixes, so that Default-Free Zone routers only contained 
routes to other providers, resulting in a very small IPv6 global routing table. 


For a single-homed site, this could work well. A multihomed site with only one upstream 
provider could also work well, although BGP multihoming from a single upstream provider was 
often a premium service (costing more than twice as much as two single-homed sites), and if the 
single upstream provider went out of service, all of the multihomed paths could fail 
simultaneously. 


IPv4 sites often multihomed by obtaining Provider Independent prefixes and advertising these 
prefixes through multiple upstream providers. With the assumption that any multihomed IPv4 
site would also multihome in IPv6, it seemed likely that IPv6 routing would be subject to the 
same pressures to announce Provider Independent prefixes, resulting in an IPv6 global routing 
table that exhibited the same explosive growth as the IPv4 global routing table. During the early 
2000s, work began on a protocol that would provide multihoming for IPv6 sites without 
requiring sites to advertise Provider Independent prefixes into the IPv6 global routing table. 


This protocol, called "Shim6", allowed two endpoints to exchange multiple addresses ("Locators") 
that all mapped to the same endpoint ("Identity"). After an endpoint learned multiple Locators 
for the other endpoint, it could send to any of those Locators with the expectation that those 
packets would all be delivered to the endpoint with the same Identity. Shim6 was an example of 
an "Identity/Locator Split" protocol. 


Shim6, as defined in [RFC5533] and related RFCs, provided a workable solution for IPv6 
multihoming using Provider Assigned prefixes, including capability discovery and negotiation, 
and allowing end-to-end application communication to continue even in the face of path failure, 
because applications don't see Locator failures and continue to communicate with the same 
Identity using a different Locator. 


6.6.1. Reasons for Non-deployment 


Note that the problem being addressed was "site multihoming", but Shim6 was providing "host 
multihoming". That meant that the decision about what path would be used was under host 
control, not under edge router control. 


Although more work could have been done to provide a better technical solution, the biggest 
impediments to Shim6 deployment were operational and business considerations. These 
impediments were discussed at multiple network operator group meetings, including [Shim6-35] 
at [NANOG-35]. 
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The technical issues centered around concerns that Shim6 relied on the host to track all the 
connections, while also tracking Identity/Locator mappings in the kernel and tracking failures to 
recognize that an available path has failed. 


The operational issues centered around concerns that operators were performing traffic 
engineering on traffic aggregates. With Shim6, these operator traffic engineering policies must be 
pushed down to individual hosts. 


In addition, operators would have no visibility or control over the decision of hosts choosing to 
switch to another path. They expressed concerns that relying on hosts to steer traffic exposed 
operator networks to oscillation based on feedback loops, if hosts moved from path to path 
frequently. Given that Shim6 was intended to support multihoming across operators, operators 
providing only one of the paths would have even less visibility as traffic suddenly appeared and 
disappeared on their networks. 


In addition, firewalls that expected to find a TCP or UDP transport-level protocol header in the IP 
payload would see a Shim6 Identity header instead, and they would not perform transport- 
protocol-based firewalling functions because the firewall's normal processing logic would not 
look past the Identity header. The firewall would perform its default action, which would most 
likely be to drop packets that don't match any processing rule. 


The business issues centered on reducing or removing the ability to sell BGP multihoming 
service to their own customers, which is often more expensive than two single-homed 
connectivity services. 


6.6.2. Lessons Learned 


It is extremely important to take operational concerns into account when a Path Aware protocol 
is making decisions about path selection that may conflict with existing operational practices and 
business considerations. 


6.6.3. Addendum on Multipath TCP 


During discussions in the PANRG session at IETF 103 [PANRG-103-Min], Lars Eggert, past 
Transport Area Director, pointed out that during charter discussions for the Multipath TCP 
Working Group [MP-TCP], operators expressed concerns that customers could use Multipath TCP 
to load-share TCP connections across operators simultaneously and compare passive 
performance measurements across network paths in real time, changing the balance of power in 
those business relationships. Although the Multipath TCP Working Group was chartered, this 
concern could have acted as an obstacle to deployment. 


Operator objections to Shim6 were focused on technical concerns, but this concern could have 
also been an obstacle to Shim6 deployment if the technical concerns had been overcome. 


6.7. Next Steps in Signaling (NSIS) 


The suggested references for Next Steps in Signaling (NSIS) are: 


e the concluded working group charter [NSIS-CHARTER-2001] 
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* "GIST: General Internet Signalling Transport" [RFC5971] 

e "NAT/Firewall NSIS Signaling Layer Protocol (NSLP)" [RFC5973] 

e "NSIS Signaling Layer Protocol (NSLP) for Quality-of-Service Signaling" [RFC5974] 
* "Authorization for NSIS Signaling Layer Protocols" [RFC5981] 


The NSIS Working Group worked on signaling techniques for network-layer resources (e.g., QoS 
resource reservations, Firewall and NAT traversal). 


When RSVP [RFC2205] was used in deployments, a number of questions came up about its 
perceived limitations and potential missing features. The issues noted in the NSIS Working Group 
charter [NSIS-CHARTER-2001] include interworking between domains with different QoS 
architectures, mobility and roaming for IP interfaces, and complexity. Later, the lack of security 
in RSVP was also recognized [RFC4094]. 


The NSIS Working Group was chartered to tackle those issues and initially focused on QoS 
signaling as its primary use case. However, over time a new approach evolved that introduced a 
modular architecture using two application-specific signaling protocols: a) the NSIS Signaling 
Layer Protocol (NSLP) on top of b) a generic signaling transport protocol (the NSIS Transport 
Layer Protocol (NTLP)). 


NTLP is defined in [RFC5971]. Two types of NSLPs are defined: an NSLP for QoS signaling 
[RFC5974] and an NSLP for NATs/firewalls [RFC5973]. 


6.7.1. Reasons for Non-deployment 


The obstacles for deployment can be grouped into implementation-related aspects and 
operational aspects. 


e Implementation-related aspects: 


Although NSIS provides benefits with respect to flexibility, mobility, and security compared 
to other network signaling techniques, hardware vendors were reluctant to deploy this 
solution, because it would require additional implementation effort and would result in 
additional complexity for router implementations. 


NTLP mainly operates as a path-coupled signaling protocol, i.e., its messages are processed at 
the control plane of each intermediate node that is also forwarding the data flows. This 
requires a mechanism to intercept signaling packets while they are forwarded in the same 
manner (especially along the same path) as data packets. NSIS uses the IPv4 and IPv6 Router 
Alert Option (RAO) to allow for interception of those path-coupled signaling messages, and 
this technique requires router implementations to correctly understand and implement the 
handling of RAOs, e.g., to only process packets with RAOs of interest and to leave packets 
with irrelevant RAOs in the fast forwarding processing path (a comprehensive discussion of 
these issues can be found in [RFC6398]). The latter was an issue with some router 
implementations at the time of standardization. 


Another reason is that path-coupled signaling protocols that interact with routers and 
request manipulation of state at these routers (or any other network element in general) are 
under scrutiny: a packet (or sequence of packets) out of the mainly untrusted data path is 
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requesting creation and manipulation of network state. This is seen as potentially dangerous 
(e.g., opens up a DoS threat to a router's control plane) and difficult for an operator to 
control. Path-coupled signaling approaches were considered problematic (see also Section 3 
of [RFC6398]). There are recommendations on how to secure NSIS nodes and deployments 
(e.g., [RFC5981]). 


e Operational Aspects: 


NSIS not only required trust between customers and their provider, but also among different 
providers. In particular, QoS signaling techniques would require some kind of dynamic SLA 
support that would imply (potentially quite complex) bilateral negotiations between 
different Internet Service Providers. This complexity was not considered to be justified, and 
increasing the bandwidth (and thus avoiding bottlenecks) was cheaper than actively 
managing network resource bottlenecks by using path-coupled QoS signaling techniques. 
Furthermore, an end-to-end path typically involves several provider domains, and these 
providers need to closely cooperate in cases of failures. 


6.7.2. Lessons Learned 


One goal of NSIS was to decrease the complexity of the signaling protocol, but a path-coupled 
signaling protocol comes with the intrinsic complexity of IP-based networks, beyond the 
complexity of the signaling protocol itself. Sources of intrinsic complexity include: 


e the presence of asymmetric routes between endpoints and routers. 

e the lack of security and trust at large in the Internet infrastructure. 

e the presence of different trust boundaries. 

e the effects of best-effort networks (e.g., robustness to packet loss). 

e divergence from the fate-sharing principle (e.g., state within the network). 


Any path-coupled signaling protocol has to deal with these realities. 


Operators view the use of IPv4 and IPv6 Router Alert Options (RAOs) to signal routers along the 
path from end systems with suspicion, because these end systems are usually not authenticated 
and heavy use of RAOs can easily increase the CPU load on routers that are designed to process 
most packets using a hardware "fast path" and diverting packets containing RAOs to a slower, 
more capable processor. 


6.8. IPv6 Flow Labels 


The suggested reference for IPv6 Flow Labels is: 
* "IPv6 Flow Label Specification" [RFC6437] 


IPv6 specifies a 20-bit Flow Label field [RFC6437], included in the fixed part of the IPv6 header 
and hence present in every IPv6 packet. An endpoint sets the value in this field to one of a set of 
pseudorandomly assigned values. If a packet is not part of any flow, the flow label value is set to 
zero [RFC3697]. A number of Standards Track and Best Current Practice RFCs (e.g., [RFC8085], 
[RFC6437], [RFC6438]) encourage IPv6 endpoints to set a non-zero value in this field. A 
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multiplexing transport could choose to use multiple flow labels to allow the network to either 
independently forward its subflows or use one common value for the traffic aggregate. The flow 
label is present in all fragments. IPsec was originally put forward as one important use case for 
this mechanism and does encrypt the field [RFC6438]. 


Once set, the flow label can provide information that can help inform network nodes about 
subflows present at the transport layer, without needing to interpret the setting of upper-layer 
protocol fields [RFC6294]. This information can also be used to coordinate how aggregates of 
transport subflows are grouped when queued in the network and to select appropriate per-flow 
forwarding when choosing between alternate paths [RFC6438] (e.g., for Equal-Cost Multipath 
(ECMP) routing and Link Aggregation Groups (LAGs)). 


6.8.1. Reasons for Non-deployment 


Despite the field being present in every IPv6 packet, the mechanism did not receive as much use 
as originally envisioned. One reason is that to be useful it requires engagement by two different 
stakeholders: 


e Endpoint Implementation: 


For network nodes along a path to utilize the flow label, there needs to be a non-zero value 
inserted in the field [RFC6437] at the sending endpoint. There needs to be an incentive for an 
endpoint to set an appropriate non-zero value. The value should appropriately reflect the 
level of aggregation the traffic expects to be provided by the network. However, this requires 
the stack to know granularity at which flows should be identified (or, conversely, which 
flows should receive aggregated treatment), i.e., which packets carry the same flow label. 
Therefore, setting a non-zero value may result in additional choices that need to be made by 
an application developer. 


Although the original flow label standard [RFC3697] forbids any encoding of meaning into 
the flow label value, the opportunity to use the flow label as a covert channel or to signal 
other meta-information may have raised concerns about setting a non-zero value [RFC6437]. 


Before methods are widely deployed to use this method, there could be no incentive for an 
endpoint to set the field. 


Operational support in network nodes: 


A benefit can only be realized when a network node along the path also uses this 
information to inform its decisions. Network equipment (routers and/or middleboxes) need 
to include appropriate support in order to utilize the field when making decisions about how 
to classify flows or forward packets. The use of any optional feature in a network node also 
requires corresponding updates to operational procedures and therefore is normally only 
introduced when the cost can be justified. 


A benefit from utilizing the flow label is expected to be increased quality of experience for 
applications -- but this comes at some operational cost to an operator and requires endpoints 
to set the field. 
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6.8.2. Lessons Learned 


The flow label is a general-purpose header field for use by the path. Multiple uses have been 
proposed. One candidate use was to reduce the complexity of forwarding decisions. However, 
modern routers can use a "fast path", often taking advantage of hardware to accelerate 
processing. The method can assist in more complex forwarding, such as ECMP routing and load 
balancing. 


Although [RFC6437] recommended that endpoints should by default choose uniformly 
distributed labels for their traffic, the specification permitted an endpoint to choose to set a zero 
value. This ability of endpoints to choose to set a flow label of zero has had consequences on 
deployability: 


e Before wide-scale support by endpoints, it would be impossible to rely on a non-zero flow 
label being set. Network nodes therefore would need to also employ other techniques to 
realize equivalent functions. An example of a method is one assuming semantics of the 
source port field to provide entropy input to a network-layer hash. This use of a 5-tuple to 
classify a packet represents a layering violation [RFC6294]. When other methods have been 
deployed, they increase the cost of deploying standards-based methods, even though they 
may offer less control to endpoints and result in potential interaction with other uses/ 
interpretation of the field. 

e Even though the flow label is specified as an end-to-end field, some network paths have been 
observed to not transparently forward the flow label. This could result from non-conformant 
equipment or could indicate that some operational networks have chosen to reuse the 
protocol field for other (e.g., internal) purposes. This results in lack of transparency, and a 
deployment hurdle to endpoints expecting that they can set a flow label that is utilized by the 
network. The more recent practice of "greasing" [GREASE] would suggest that a different 
outcome could have been achieved if endpoints were always required to set a non-zero 
value. 

e [RFC1809] noted that setting the choice of the flow label value can depend on the 
expectations of the traffic generated by an application, which suggests that an API should be 
presented to control the setting or policy that is used. However, many currently available 
APIs do not have this support. 


A growth in the use of encrypted transports (e.g., QUIC [RFC9000]) seems likely to raise issues 
similar to those discussed above and could motivate renewed interest in utilizing the flow label. 


6.9. Explicit Congestion Notification (ECN) 
The suggested references for Explicit Congestion Notification (ECN) are: 


e "Recommendations on Queue Management and Congestion Avoidance in the Internet" 
[RFC2309] 


e "A Proposal to add Explicit Congestion Notification (ECN) to IP" [RFC2481] 
* "The Addition of Explicit Congestion Notification (ECN) to IP" [RFC3168] 
e "Implementation Report on Experiences with Various TCP RFCs" [vista-impl], slides 6 and 7 
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e "Implementation and Deployment of ECN" (at [SallyFloyd]) 


In the early 1990s, the large majority of Internet traffic used TCP as its transport protocol, but 
TCP had no way to detect path congestion before the path was so congested that packets were 
being dropped. These congestion events could affect all senders using a path, either by "lockout", 
where long-lived flows monopolized the queues along a path, or by "full queues", where queues 
remain full, or almost full, for a long period of time. 


In response to this situation, "Active Queue Management" (AQM) was deployed in the network. A 
number of AQM disciplines have been deployed, but one common approach was that routers 
dropped packets when a threshold buffer length was reached, so that transport protocols like 
TCP that were responsive to loss would detect this loss and reduce their sending rates. Random 
Early Detection (RED) was one such proposal in the IETF. As the name suggests, a router using 
RED as its AQM discipline that detected time-averaged queue lengths passing a threshold would 
choose incoming packets probabilistically to be dropped [RFC2309]. 


Researchers suggested providing "explicit congestion notifications" to senders when routers 
along the path detected that their queues were building, giving senders an opportunity to "slow 
down" as if a loss had occurred, giving path queues time to drain, while the path still had 
sufficient buffer capacity to accommodate bursty arrivals of packets from other senders. This 
was proposed as an experiment in [RFC2481] and standardized in [RFC3168]. 


A key aspect of ECN was the use of IP header fields rather than IP options to carry explicit 
congestion notifications, since the proponents recognized that 


Many routers process the "regular" headers in IP packets more efficiently than they process 
the header information in IP options. 


Unlike most of the Path Aware technologies included in this document, the story of ECN 
continues to the present day and encountered a large number of Lessons Learned during that 
time. The early history of ECN (non-)deployment provides Lessons Learned that were not 
captured by other contributions in Section 6, so that is the emphasis in this section of the 
document. 


6.9.1. Reasons for Non-deployment 


ECN deployment relied on three factors - support in client implementations, support in router 
implementations, and deployment decisions in operational networks. 


The proponents of ECN did so much right, anticipating many of the Lessons Learned now 
recognized in Section 4. They recognized the need to support incremental deployment (Section 
4.2). They considered the impact on router throughput (Section 4.8). They even considered trust 
issues between end nodes and the network, for both non-compliant end nodes (Section 4.10) and 
non-compliant routers (Section 4.9). 


They were rewarded with ECN being implemented in major operating systems, for both end 
nodes and routers. A number of implementations are listed under "Implementation and 
Deployment of ECN" at [SallyFloyd]. 
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What they did not anticipate was routers that would crash when they saw bits 6 and 7 in the IPv4 
Type of Service (TOS) octet [RFC0791] / IPv6 Traffic Class field [RFC2460], which [RFC2481] 
redefined to be "Currently Unused", being set to a non-zero value. 


As described in [vista-impl] ("IGD" stands for "Intermediate Gateway Device"), 


IGD problem #1: one of the most popular versions from one of the most popular 
vendors. When a data packet arrives with either ECT(0) or ECT(1) (indicating successful 
ECN capability negotiation) indicated, router crashed. Cannot be recovered at TCP layer 
[sic] 


This implementation, which would be run on a significant percentage of Internet end nodes, was 
shipped with ECN disabled, as was true for several of the other implementations listed under 
"Implementation and Deployment of ECN" at [SallyFloyd]. Even if subsequent router vendors 
fixed these implementations, ECN was still disabled on end nodes, and given the trade-off 
between the benefits of enabling ECN (somewhat better behavior during congestion) and the 
risks of enabling ECN (possibly crashing a router somewhere along the path), ECN tended to stay 
disabled on implementations that supported ECN for decades afterwards. 


6.9.2. Lessons Learned 


Of the contributions included in Section 6, ECN may be unique in providing these lessons: 


e Even if you do everything right, you may trip over implementation bugs in devices you know 
nothing about, that will cause severe problems that prevent successful deployment of your 
Path Aware technology. 


e After implementations disable your Path Aware technology, it may take years, or even 
decades, to convince implementers to re-enable it by default. 


These two lessons, taken together, could be summarized as "you get one chance to get it right." 


During discussion of ECN at [PANRG-110], we noted that "you get one chance to get it right" isn't 
quite correct today, because operating systems on so many host systems are frequently updated, 
and transport protocols like QUIC [RFC9000] are being implemented in user space and can be 
updated without touching installed operating systems. Neither of these factors were true in the 
early 2000s. 


We think that these restatements of the ECN Lessons Learned are more useful for current 
implementers: 


e Even if you do everything right, you may trip over implementation bugs in devices you know 
nothing about, that will cause severe problems that prevent successful deployment of your 
Path Aware technology. Testing before deployment isn't enough to ensure successful 
deployment. It is also necessary to "deploy gently", which often means deploying for a small 
subset of users to gain experience and implementing feedback mechanisms to detect that 
user experience is being degraded. 
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e After implementations disable your Path Aware technology, it may take years, or even 
decades, to convince implementers to re-enable it by default. This might be based on the 
difficulty of distributing implementations that enable it by default, but it is just as likely to be 
based on the "bad taste in the mouth" that implementers have after an unsuccessful 
deployment attempt that degraded user experience. 


With these expansions, the two lessons, taken together, could be more helpfully summarized as 
"plan for failure" -- anticipate what your next step will be, if initial deployment is unsuccessful. 


ECN deployment was also hindered by non-deployment of AQM in many devices, because of 
operator interest in QoS features provided in the network, rather than using the network to 
assist end systems in providing for themselves. But that's another story, and the AQM Lessons 
Learned are already covered in other contributions in Section 6. 


7. Security Considerations 


This document describes Path Aware techniques that were not adopted and widely deployed on 
the Internet, so it doesn't affect the security of the Internet. 


If this document meets its goals, we may develop new techniques for Path Aware networking 
that would affect the security of the Internet, but security considerations for those techniques 
will be described in the corresponding RFCs that specify them. 


8. IANA Considerations 


This document has no IANA actions. 
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